Using the Lemmatization Technique for Phonetic Transcription in Text-to-Speech System
نویسندگان
چکیده
This paper deals with lemmatization technique and its using for the phonetic transcription of exceptional words. The lemmatizer is based on language morphology and uses the lexicon of word basic forms and inversion of the derivation rules to acquire the lemmatization rules which are essential for finding the word bases. We have described the lemmatization algorithm and necessary modifications of the lemmatizer to transcribe exceptional words. The main goal of the designed system is memory saving of the exceptional lexicon. The experimental results have shown that we can save from 18.3% (English) to 98.4% (Finnish) of size of the full lexicon. Hence, this system is suitable for high inflectional and agglutinative languages.
منابع مشابه
Using the Lemmatization Technique for Phonetic Transcription in Text-to-Speech System
This paper deals with lemmatization technique and its using for the phonetic transcription of exceptional words. The lemmatizer is based on language morphology and uses the lexicon of word basic forms and inversion of the derivation rules to acquire the lemmatization rules which are essential for finding the word bases. We have described the lemmatization algorithm and necessary modifications o...
متن کاملمراحل و نحوه ی تهیه ی دادگان های صوتی هجایی و دایفونی برای سامانه ی تبدیل متن به گفتار فارسی
Abstract Speech databases are part of the concatenative text to speech synthesis systems. Phonetic quality of the databases plays a significant role in the naturalness of the synthesized speech. This paper introduces two syllable and diphone speech databases for Persian and investigates the way of their development and their specifications and their advantages to each other. ...
متن کاملData Driven Approaches to Phonetic Transcription with Integration of Automatic Speech Recognition and Grapheme-to-Phoneme for Spoken Buddhist Sutra
We propose a new approach for performing phonetic transcription of text that utilizes automatic speech recognition (ASR) to help traditional grapheme-to-phoneme (G2P) techniques. This approach was applied to transcribe Chinese text into Taiwanese phonetic symbols. By augmenting the text with speech and using automatic speech recognition with a sausage searching net constructed from multiple pro...
متن کاملUsing speech recognition technique for constructing a phonetically transcribed taiwanese (min-nan) text corpus
Collection of Taiwanese text corpus with phonetic transcription suffers from the problems of multiple pronunciation variation. By augmenting the text with speech, and using automatic speech recognition with a sausage searching net constructed from the multiple pronunciations of the text corresponding to its speech utterance, we are able to reduce the effort for phonetic transcription. By using ...
متن کاملAutomatic generation of phonetic transcriptions for large speech corpora
We describe a method for the automatic production of phonetic transcriptions in large speech corpora. First, we focus on the application of different techniques for the generation of pronunciation variants. Then, we explain the application of a speech recognition system for selecting the acoustically best matching phonetic transcription. The system is evaluated on different test sets selected f...
متن کامل